Digitization Errors In Hungarian Documents
نویسندگان
چکیده
Our task was to analyze a certain digitizing system, check what type of errors emerge during the process, and how these errors effect the searchability of the digitized documents. We have set up a testbed which is suitable for the automatic processing of digitized texts in a large scale. In this paper we shortly introduce the methodology of document digitization emphasizing the error-sources in the process, and sketch the results obtained from our test-system, especially the Hungarian language dependent characteristics of the emerging errors.
منابع مشابه
Towards the Creation of a Robust Search Index for Digitalized Documents
The simultaneous support of electronic and paper-based document handling is a natural demand of current filing and document management systems. To support the better management of search and retrieval functions and to reduce the high costs of digitizing, the Department of Distributed Systems of SZTAKI analysed the different kinds of error that emerged during the digitization process of Hungaria...
متن کاملHigh-resolution video mosaicing for documents and photos by estimating camera motion
Recently, document and photograph digitization from a paper is very important for digital archiving and personal data transmission through the internet. To realize easy and high quality digitization of documents and photographs, we propose a novel digitization method that uses a movie captured by a hand-held camera. In our method, first, 6-DOF(Degree Of Freedom) position and posture parameters ...
متن کاملBook-Adaptive and Book-Dependent Models to Accelerate Digitization of Early Music
Optical music recognition (OMR) enables early music collections to be digitized on a large scale. The workflow for such digitisation projects also includes scanning and preprocessing, but the cost of expert human labour to correct automatic recognition errors dominates the cost of these other two steps. To reduce the number of recognition errors in the OMR process, we present an innovative appl...
متن کامل‘As We May Digitize’ — Institutions and Documents Reconfigured
This article frames digitization as a knowledge organization practice in libraries and museums. The primarily discriminatory practices of museums are compared with the non-discriminatory practices of libraries when managing their respective cultural heritage collections. Digitization of cultural heritage brings new practices, tools and arenas that reconfigure and reinterpret not only the collec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007